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Abstract. We provide a smoothed analysis of Hoare's find algorithm and we 
revisit the smoothed analysis of quicksort. 

Hoare's find algorithm - often called quickselect - is an easy-to-implcment al- 
gorithm for finding the k-th smallest element of a sequence. While the worst-case 
number of comparisons that Hoare's find needs is 0(re 2 ), the average-case number 
is 0(re). We analyze what happens between these two extremes by providing a 
smoothed analysis of the algorithm in terms of two different perturbation models: 
additive noise and partial permutations. 

In the first model, an adversary specifies a sequence of re numbers of [0, 1], and 
then each number is perturbed by adding a random number drawn from the in- 
terval [0,d\. We prove that Hoare's find needs @[-^jy / n/d + re) comparisons in 
expectation if the adversary may also specify the element that we would like to 
find. Furthermore, we show that Hoare's find needs fewer comparisons for finding 
the median. 

In the second model, each element is marked with probability p and then a 
random permutation is applied to the marked elements. We prove that the expected 
number of comparisons to find the median is f2((l— p) — log re), which is again tight. 

Finally, we provide lower bounds for the smoothed number of comparisons of 
quicksort and Hoare's find for the median-of-three pivot rule, which usually yields 
faster algorithms than always selecting the first element: The pivot is the median of 
the first, middle, and last element of the sequence. We show that median-of-three 
does not yield a significant improvement over the classic rule: the lower bounds for 
the classic rule carry over to median-of-three. 



*An extended abstract of this paper will appear in the proceedings of the 15th Int. Computing and Combina- 
torics Conference (COCOON 2009). 
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1 Introduction 



To explain the discrepancy between average-case and worst-case behavior of the simplex algo- 
rithm, Spielman and Teng introduced the notion of smoothed analysis [18]. Smoothed analy- 
sis interpolates between average-case and worst-case analysis: Instead of taking a worst-case 
instance, we analyze the expected worst-case running time subject to slight random pertur- 
bations. The more influence we allow for perturbations, the closer we come to the average 
case analysis of the algorithm. Therefore, smoothed analysis is a hybrid of worst-case and 
average-case analysis. 

In practice, neither can we assume that all instances are equally likely, nor that instances 
are precisely worst-case instances. The goal of smoothed analysis is to capture the notion of 
a typical instance mathematically. Typical instances are, in contrast to worst-case instances, 
often subject to measurement or rounding errors. Even if one assumes that nature is adversarial 
and that the instance at hand was initially a worst-case instance, due to such errors we would 
probably get a less difficult instance in practice. On the other hand, typical instances still have 
some (adversarial) structure, which instances drawn completely at random do not. Spielman 
and Teng [19] give a survey of results and open problems in smoothed analysis. 

In this paper, we provide a smoothed analysis of Hoare's find [7] (see also Aho et al. [1, 
Algorithm 3.7]), which is a simple algorithm for finding the fc-th smallest element of a sequence 
of numbers: Pick the first element as the pivot and compare it to all n — 1 remaining elements. 
Assume that £ — 1 elements are smaller than the pivot. If £ = k, then the pivot is the element 
that we are looking for. If £ > k, then we recurse to find the fc-th smallest element of the list of 
the smaller elements. If £ < k, then we recurse to find the (k — £)-th smallest element among 
the larger elements. The number of comparisons to find the specified element is 0(n 2 ) in the 
worst case and @(n) on average. Furthermore, the variance of the number of comparisons is 
G(n 2 ) [8]. As our first result, we close the gap between the quadratic worst-case running-time 
and the expected linear running-time by providing a smoothed analysis. 

Hoare's find is closely related to quicksort [6] (see also Aho et al. [1, Section 3.5]), which 
needs @(n 2 ) comparisons in the worst case and O(nlogn) on average [10, Section 5.2.2]. The 
smoothed number of comparisons that quicksort needs has already been analyzed [12]. Choos- 
ing the first element as the pivot element, however, results in poor running-time if the sequence 
is nearly sorted. There are two common approaches to circumvent this problem: First, one 
can choose the pivot randomly among the elements. However, randomness is needed to do 
so, which is sometimes expensive. Second, without any randomness, a common approach to 
circumvent this problem is to compute the median of the first, middle, and last element of 
the sequence and then to use this median as the pivot [16, 17]. This method is faster in prac- 
tice since it yields more balanced partitions and it makes the worst-case behavior much more 
unlikely [10, Section 5.5]. It is also faster both in average and in worst case, albeit only by 
constant factors [5,15]. Quicksort with the median-of-three rule is widely used, for instance in 
the qsortQ implementation in the GNU standard C library glibc library [14] and also in a 
recent very efficient implementation of quicksort on a GPU [3]. The median-of-three rule has 
also been used for Hoare's find, and the expected number of comparisons has been analyzed 
precisely [9]. 

Our second goal is a smoothed analysis of both quicksort and Hoare's find with the median- 
of-three rule to get a thorough understanding of this variant of these two algorithms. 
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1.1 Preliminaries 



We denote sequences of real numbers by s = (si, . . . , s n ), where Si 6 M. For n G N, we set 
[n] = {1, . . . ,n}. Let U = {ii, . . . ,i e } C [n] with ii < i 2 < . . . < ig. Then S[/ = (s^s^, . . . ,SjJ 
denotes the subsequence of s of the elements at positions in U. We denote probabilities by P 
and expected values by E. 

Throughout the paper, we will assume for the sake of clarity that numbers like \fd are 
integers and we do not write down the tedious floor and ceiling functions that are actually 
necessary. Since we are interested in asymptotic bounds, this does not affect the validity of 
the proofs. 

Pivot Rules. Given a sequence s, a pivot rule simply selects one element of s as the pivot 
element. The pivot element will be the one to which we compare all other elements of s. In 
this paper, we consider four pivot rules, two of which play only a helper role (the acronyms of 
the rules are in parentheses): 

Classic rule (c): The first element s\ of s is the pivot element. 

Median- of -three rule (m3): The median of the first, middle, and last element is the pivot ele- 
ment, i.e., median(si, S|- n / 2 ] , 

Maximum-of-two rule (max2): The maximum of the first and the last element becomes the 
pivot element, i.e., max(si,s n ). 

Minimum- of -two rule (min2): The minimum of the first and the last element becomes the 
pivot element, i.e., min(si,s n ). 

The first pivot rule is the easiest-to-analyze and easiest-to-implement pivot rule for quicksort 
and Hoare's find. Its major drawback is that it yields poor running-times of quicksort and 
Hoare's find for nearly sorted sequences. The advantages of the median-of-three rule has 
already been discussed above. The last two pivot rules are only used as tools for analyzing the 
median-of-three rule. 

Quicksort, Hoare's Find, Left-to-right Maxima. Let s be a sequence of length n con- 
sisting of pairwise distinct numbers. Let p be the pivot element of s according to some rule. 
For the following definitions, let L = {i G {1, . . . , n) | s, < p} be the set of positions of elements 
smaller than the pivot, and let R = {i G {1, . . . , n} \ Si > p} be the set of positions of elements 
greater than the pivot. 

Quicksort is the following sorting algorithm: Given s, we construct sl and sr by comparing 
all elements to the pivot p. Then we sort sl and sr recursively to obtain s' L and s' R , respectively. 
Finally, we output s' = (s' L ,p, s' R ). The number sort(s) of comparisons needed to sort s is thus 
sort(s) = (n — 1) + sort(si) + sort(s^) if s has a length of n > 1, and sort(s) = when s is 
the empty sequence. We do not count the number of comparisons needed to find the pivot 
element. Since this number is O(l) per recursive call for the pivot rules considered here, this 
does not change the asymptotics. 

Hoare's find aims at finding the k-th smallest element of s. Let £ = \sl\- If £ = k — 1, 
then p is the fc-th smallest element. If t > k, then we search for the fc-th smallest element 
of sl. If i < k — 1, then we search for the (k — £)-th smallest element of sr. Let find(s, k) 
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denote the number of comparisons needed to find the fc-th smallest element of s, and let 
find(s) = maxj. e [ n ] find(s, k). 

The number of scan maxima of s is the number of maxima seen when scanning s according 
to some pivot rule: let scan(s) = 1 + scan(s^), and let scan(s) = when s is the empty 
sequence. If we use the classic pivot rule, the number of scan maxima is just the number of 
left-to-right maxima, i.e., the number of new maxima that we see if we scan s from left to right. 
The number of scan maxima is a useful tool for analyzing quicksort and Hoare's find, and has 
applications, e.g., in motion complexity [4]. 

We write c-scan(s), m3-scan(s), max2-scan(s), and min2-scan(s) to denote the number of 
scan maxima according to the classic, median-of-three, maximum, or minimum pivot rule, 
respectively. Similar notation is used for quicksort and Hoare's find. 

Perturbation Model: Additive noise. The first perturbation model that we consider 
is additive noise. Let d > 0. Given a sequence s G [0, l] ra , i.e., the numbers s±, . . . , s n lie 
in the interval [0, 1], we obtain the perturbed sequence ~s = (si, . . . , s n ) by drawing u\, . . . , u n 
uniformly and independently from the interval [0, d] and setting Sj = si + vi. Note that d = d{n) 
may be a function of the number n of elements, although this will not always be mentioned 
explicitly in the following. 

We denote by scan^(s), sort^s) and find^(s) the (random) number of scan maxima, quicksort 
comparisons, and comparisons of Hoare's find of s, preceded by the acronym of the pivot rule 
used. 

Our goal is to prove bounds for the smoothed number of comparisons that Hoare's find 
needs, i.e., max sg [ ,i] n E(c-findd(s)), as well as for Hoare's find and quicksort with the median- 
of-three pivot rule, i.e., max se j 01 ]n E(m3-find ( i(s)) and max se j 01 ]n E(m3-sortd(s)) . The max 
reflects that the sequence s is chosen by an adversary. 

If d < 1/n, the sequence s can be chosen such that the order of the elements is unaffected 
by the perturbation. Thus, in the following, we assume d > 1/n. If d is large, the noise will 
swamp out the original instance, and the order of the elements of s will basically depend only 
on the noise rather than the original instance. For intermediate d, we interpolate between the 
two extremes. 

The choice of the intervals for the adversarial part and the noise is arbitrary. All that matters 
is the ratio of the sizes of the intervals: For a < b, we have max sg [ a fe ]n E(fuKLi.(6_ a )(s)) = 
max sg [ 0j i]n E(find^(s)). In other words, we can scale (and also shift) the intervals, and the 
results depend only on the ratio of the interval sizes and the number of elements. The same 
holds for all other measures that we consider. We will exploit this in the analysis of Hoare's 
find. 

Perturbation Model: Partial Permutations. The second perturbation model that we 
consider is partial permutations, introduced by Banderier, Beier, and Mehlhorn [2]. Here, the 
elements are left unchanged. Instead, we permute a random subsets of the elements. 

Without loss of generality, we can assume that s is a permutation of a set of n numbers, 
say, {l,...,n}. The perturbation parameter is p G [0,1]. Any element Sj (or, equivalently, 
any position i) is marked independently of the others with a probability of p. After that, all 
marked positions are randomly permuted: Let M be the set of positions that are marked, and 
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let 7r : M — > M be a permutation drawn uniformly at random. Then 




s-^j) if i & M and 
Si otherwise. 



If p = 0, no element is marked, and we obtain worst-case bounds. If p = 1, all elements are 
marked, and s is a uniformly drawn random permutation. 

1.2 Known Results 

Additive noise is perhaps the most basic and natural perturbation model for smoothed analysis. 
In particular, Spielman and Teng added random numbers to the entries of the adversarial 
matrix in their smoothed analysis of the simplex algorithm [18]. Damerow, Meyer auf der 
Heide, Racke, Scheideler, and Sohler [4] analyzed the smoothed number of left-to-right maxima 
of a sequence under additive noise. They obtained upper bounds of O ( log n + logn) 
for a variety of distributions and a lower bound of + logn). Manthey and Tantau 

tightened their bounds for uniform noise to 0(y/n/d + logn). Furthermore, they proved that 
the same bounds hold for the smoothed tree height. Finally, they showed that quicksort needs 
0(dTT ' \/f) comparisons in expectation, and this bound is also tight [12]. 

Banderier et al. [2] introduced partial permutations as a perturbation model for ordering 
problems like left-to-right maxima or quicksort. They proved that a sequence of n numbers 
has, after partial permutation, an expected number of 0(y^ log n) left-to-right maxima, and 

they proved a lower bound of f2(yn/p) for p < ^. This has later been tightened by Manthey 
and Reischuk [11] to 0((1 — p) • \/n/pj. They transferred this to the height of binary search 
trees, for which they obtained the same bounds. Banderier et al. [2] also analyzed quicksort, 
for which they proved an upper bound of 0(^ logn). 

1.3 New Results 

We give a smoothed analysis of Hoare's find under additive noise. We consider both finding an 
arbitrary element and finding the median. First, we analyze finding arbitrary elements, i.e., 
the adversary specifies k, and we have to find the k-th smallest element (Section [2]). For this 
variant, we prove tight bounds of ©(^y \/n/d + n) for the expected number of comparisons. 
This means that already for very small d G w(l/n), the smoothed number of comparisons is 
reduced compared to the worst case. If d is a small constant, i.e., the noise is a small percentage 
of the data values like 1%, then 0(n 3 / 2 ) comparisons suffice. 

If the adversary is to choose k, our lower bound suggests that we will have either k = 1 or 
k = n. The main task of Hoare's find, however, is to find medians. Thus, second, we give a 
separate analysis of how much comparisons are needed to find the median (Section [3]). It turns 
out that under additive noise, finding medians is arguably easier than finding maximums or 
minimums: For d < 1/2, we have the same bounds as above. For d £ (^,2), we prove a lower 
bound of tt(n 3 / 2 • (1 - ^/d/2)), which again matches the upper bound of Section [2l which of 
course still applies (Section 13. ip . For d > 2, we prove that a linear number of comparisons 
suffices, which is considerably less than the fi((n/d) 3//2 ) general lower bound of Section^ For 
the special value d = 2, we prove a tight bound of ©(nlogn) (Sections 13.31 and I3.4H . 

After that, we aim at analyzing different pivot rules, namely the median-of-three rule. As 
a tool, we analyze the number of scan maxima under the maximum-of-two, minimum-of-two, 
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algorithm 


d < 1/2 


de (1/2,2) 


d = 2 


d > 2 


quicksort (c) 
quicksort (m3) 


Q(ny/n/d) 
Sl(ny/n/<£) 


e( n 3/2) 


e( n 3/2) 
n(n 3 / 2 ) 


G((n/(i) 3 / 2 ) 
tt((n/d) 3 / 2 ) 


Hoare's find (median, c) 
Hoare's find (general, c) 
Hoare's find (general, m3) 


@(ny/n/d) 
e{n[/n/d) 
Q(riy/n/d) 


n(n 3 / 2 (i-- 
e( n 3/2) 

e(n 3 / 2 ) 


s/d/2)) e(nlogn) 

e(n 3 / 2 ) 
e( n s/2) 


°(a=2 -n ) 
9((n/d) 3 /2) 

6((n/(i) 3 / 2 ) 


scan maxima (c) 
scan maxima (m3) 


e(y^/d) 
eQn/d) 


e(Vn) 


e(Vn) 
e(VH) 


e(Vn/d) 



Table 1: Overview of bounds for additive noise. The bounds for quicksort and scan maxima 
with classic pivot rule are by Manthey and Tantau [12]. The upper bounds for Hoare's 
find in general apply also to Hoare's find for finding the median. Note that, even for 
large d, the precise bounds for quicksort, Hoare's find, and scan maxima never drop 
below Q(nlogn), Q(n), and f2(logn), respectively. 



algorithm 


bound 




quicksort 


OUn/p) log re) 


Hoare's find 




p)(n/p) log n) 


scan maxima 


e((i- 


p)V n /p) 


binary search trees 


e((i- 


p)V n /p) 



Table 2: Overview of bounds for partial permutations. All results are for the classic pivot rule. 

The results about quicksort, scan maxima, and binary search trees are by Banderier 
et al. [2] and Manthey and Reischuk [11]. The upper bound for quicksort also holds 
for Hoare's find, while the lower bound for Hoare's find also applies to quicksort. 

and median-of-three rule (Section We essentially show that the same bounds as for the 
classic rule carry over to these rules. Then we apply these findings to quicksort and Hoare's 
find (Section [5]). Again, we prove a lower bound that matches the lower bound for the classic 
rule. Thus, the median-of-three does not seem to help much under additive noise. 
The results concerning additive noise are summarized in Table HJ 

Finally, and to contrast our findings for additive noise, we analyze Hoare's find under partial 
permutations (Section Ej). We prove that there exists a sequence on which Hoare's find needs 
an expected number of 0((1 — p) ■ • log re) comparisons. Since this matches the upper bound 
for quicksort [2] up to a factor of 0(1 — p), this lower bound is essentially tight. 

For completeness, Table [2] gives an overview of the results for partial permutations. 

2 Smoothed Analysis of Hoare's Find: General Bounds 

In this section, we prove tight bounds for the smoothed number of comparisons that Hoare's 
find needs using the classic pivot rule. 
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Theorem 2.1. For d > 1/n, we have 

max E(c-find d (s)) G ( ttt \AV^ + «) • 
se[o,i] n + 

The following subsection contains the proof of the upper bound. After that, we prove the 
lower bound. 

2.1 General Upper Bound for Hoare's Find 

We already have an upper bound for the smoothed number of comparisons that quicksort 
needs [12]. This bound is 0(^y • ^Jn/d + nlogrtj, which matches the bound of Theorem 12.11 

for d G 0(ra 1//3 • log~ 2 / 3 n). We have find(s) < sort(s) for any s. By monotonicity of the 
expectation, this inequality yields E(fmdd(s)) < E(sort<j(s)) . Thus, d G ^(n 1 / 3 • log -2 / 3 ra) 
remains to be analyzed. 

In the next lemma, we show how to analyze the number of comparisons in terms of subse- 
quences. Lemma 12.31 states that adding a single element to a sequence increases the number 
of comparisons at most by an additive 0{n). Lemma 12.41 states the actual upper bound. 

Lemma 2.2. Let s be a sequence, and let k G [n]. Let j be the position of the k-th smallest 
element of s. Let U\, . . . , U m be a covering of [n], i.e., U&li Ug = [n], such that j G Ug for all 
£ G [m\. Let k\, . . . , k m be chosen such that Sj is the kg-th smallest element of su r Then 

m 

nnd(s,/c) < y]fmd(su t ,ke) + Q, 
i=i 

where Q is the number of comparisons of positions a and b in nnd(s,/c) such that a and b do 
not share a common set in the covering, i.e., {a,b} % Ug for all £ G [m]. 

Proof. Fix any £ G [m], and let a and b be two elements of su, that are not compared for 
finding the kg-th. smallest element of Ug. Without loss of generality, we assume that a < b and 
that a appears before b in su t (and hence in s). 

If a is not compared to b, then this is due to one of the following two reasons: 

1. There is a c prior to a in sjj t such that either < c < a or that a < b < c < s^- 

2. There is a c in sjj e prior to a with a < c < b. 

In either case, a and b are also not compared while searching for the k-th smallest element of s. 
All comparisons are accounted for, either in a find(s[/.) or in Q, which proves the lemma. □ 

Lemma 2.3. Let s be any sequence of length n, and let s' be obtained from s by inserting one 
arbitrary element t at an arbitrary position of s. Then 

find(s') < find(s) + n + 0(l). 

Proof. Let U, U' C [n + 1] such that U contains all positions of elements of s in s' and U' 
contains the positions of the target element and of t. We apply Lemma [2.21 with these two sets. 
First, find(s / [// ) G 0(1) since U' contains only two elements. Second, Q < n: we only have to 
count the number of comparisons that involve t, and t is compared to any other element of s 
at most once. Third, find(s / [/ ) = find(s) since s = s' v . □ 
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Lemma 2.4. Let d > n 1 / 3 • log 2 ^ 3 n, and let s be arbitrary. Then 

E( C -find d ( S )) e o(^y /2 + n y 

Proof. The key insight is the following observation: Given that an element Sj assumes a value 
in [l,d], it is uniformly distributed in this interval. 

Let R = {i | Si G [l,a!]} be the set of all indices of regular elements, i.e., elements that are 
uniformly distributed in [1, d]. Let F = {i \ < 3} be the set of all elements with noise at most 
3, which covers in particular all i that are not in R due to Hi being too small. Analogously, let 
B = {i | Ui > d — 3} be the set of all elements with noise at least d — 3, which includes all i 
that are not in R due to Sj being too large. We have F U R U B = [n]. 

We prove that the expected values of c-nnd^sp), c-findrf(s^), c-finddtXe) as well as the 
expected number of comparisons between elements in different subsets are bounded from above 
by 0({n/df/ 2 + n). Comb ining Lemmas 12.21 and 12.31 yields the result. (Lemma l2.3l is necessary 
since we have to add the target element to all three sets.) 

First, E(c-find d (sii)) € 0(n) C 0{{n/df/ 2 + n) since the elements of sr are uniformly dis- 
tributed in [1, d]. Second, E(c-fmd<i(s.B)) = E(c-findrf(si?)) since both are equally distributed. 
Thus, we can restrict ourselves to E(c-findd(sir)) . Given that i G F, the noise Vi is uniformly 
distributed in [0,1]. Thus, we can apply the upper bound for quicksort for d = 3, which is 
|.F| 3 / 2 . The probability that any element is in F is §. By Chernoff's bound [13, Corollary 
4.6], the probability that \F\ > ^ is exp(— n £ ) for some constant e > 0. If this happens 
nevertheless, we bound the number of comparisons by the worst-case bound of B(n 2 ). Due to 
the small probability, however, this contributes only o(l) to the expected value. If F contains 
fewer than 6n/d elements, then we obtain E(c-find(s)jr) G 0((n/<i) 3 / 2 ) , which is fine. 

Third, and finally, the number of comparisons between elements with Sj < 1 and elements 
with Vj > 3 remains to be considered. In the first subcase, we count the number of comparisons 
with an element with Sj < 1 being the pivot. We observe that s« < 1 is compared to Sj with 
Vj > 3 only if there is no position £ < i with i>i G [2,3]. For every element £, we have 
P(s^ < l) = < ^ = G [2,3]). Thus, the probability that we have m elements 

ii,...,i m with ~s~i z < 1 before the first position £ with ui G [2,3] is bounded from above by 
2~ m . If we have that many elements, we bound the number of such comparisons by mn. Thus, 
an upper bound for the number of such comparisons is X^meN 2~ m mn G 0(n). Similarly, the 
number of comparisons between elements with s j < 1 and Sj > d (ignoring which of them is 
the pivot) is also 0(n). 

In the second subcase, let us count the number of comparisons between elements with Vj > 3 
and Sj < d and Sj < 1 with the former being the pivot. An upper bound for this is the number 
of comparisons of elements satisfying s G [l,d] (which is just s' R ) with elements satisfying 
Si < 1. There are at most 0(n/d) of the latter by Chernoff's bound (otherwise, we bound the 
number of comparisons by 0(n 2 ) again), and only left-to-right minima o£sr. The expected 
number of left-to-right minima of a sequence is O(logn), resulting in an 0( "'^ gn ) C 0(n) 
bound since d > log n. □ 

2.2 General Lower Bound for Hoare's Find 

Now we turn to the general lower bound. The proof is similar to Manthey and Tantau's lower 
bound proof for quicksort [12]. 
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Lemma 2.5. For the sequence s = (1/n, 2/n, 3/n, . . . , §/n, 1, 1, . . . , 1) and all d > l/n, we 
have 

E(c-fmd d (s)) G n^y^/d + n). 

Proof. We aim at finding the maximum element. Then the pivot elements are just the left- 
to-right maxima. As in the analysis of the smoothed number of quicksort comparisons, any 
left-to-right maximum Sj of ~s must be compared to every element of s that is greater than 
with Si being the pivot element. We have an expected number of 0(y / n/(f + log n) left-to-right 
maxima among the first n/2 elements of s [12]. 

If d < i, then every element of the second half is greater than any element of the first half. 
In this case, an expected number of 0(n ■ yfn/a) = ^(^qrr • \J n/d) comparisons are needed. 

If d > i, a sufficient condition that an element s, (i > n/2) is greater than all elements of the 
first half is z/j > d— ^, which happens with a probability of Thus, we expect to see ^ such 
elements. Since the number of left-to-right maxima in the first half and the number of elements 
Si with Vi > d — o in the second half are independent random variables, we can multiply their 
expected values to obtain a lower bound of ^((-y/n/d + logn) • j^)- If d > logn, this equals 
17 • yjn/d). If d < logn, then y/n/d dominates logn, and we obtain again 12(^pT ■ yn/rf). 

Observing that E(fmdd(s)) drops never below the best-case number of comparisons, which 
is f2(n), completes the proof. □ 

3 Smoothed Analysis of Hoare's Find: Finding the Median 

In this section, we prove tight bounds for the special case of finding the median of a sequence 
using Hoare's find. Somewhat surprisingly, finding the median seems to be easier in the sense 
that fewer comparisons suffice. 

Theorem 3.1. Depending on d, we have the following bounds for 

max E(c-findd(s, [n/2])) : 
se[o,i]™ 

For d<\, we have 6(n • y/n/d). For ~ < d < 2, we have - \Jd/2) ■ n 3 / 2 ) and 0(n 3 / 2 ) . 
For d = 2, we have 0(n • logn) . Finally, for d > 2, we have 0(^2 • n) . 

The upper bounds of 0(n ■ ^Jn/d) for d < \ and ^ < d < 2 follow from our general upper 
bound (Theorem 12.11) . For d < ^, our lower bound construction for the general bounds also 
works: The median is among the last n/2 elements, which are the big ones. (We might want to 
have [n/2] or n/2 + 1 large elements to assure this.) The rest of the proof remains the same. 

For d > 2, Theorem 13 . 1 1 states a linear bound, which is asymptotically equal to the average- 
case bound. Thus, we do not need a lower bound in this case. 

In the following sections, we give proofs for the remaining cases. First, we prove the lower 
bound for \ < d < 2 (Section 13. ip . then we prove the upper bound for d > 2 (Section 13. 2p . 
Finally, we prove the the bound of G(nlogn) for d = 2 in Sections 13.31 and l 3.4i 

3.1 Lower Bound for d < 2 

We will prove lower bounds matching our general upper bound of 0{-^^ ■ \Jn/d). Since d < 2, 
this equals 0(n ■ yjn/d). We already have a bound for d < 5, thus we can restrict ourselves to 
\ < d < 2. The idea is similar to the lower bound construction for quicksort [12]. 
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Lemma 3.2. Let \ < d < 2. Then there exists a family (s^) n ^fq, where s^ has length n, 
such that 

E(c-find d (s( n \ [n/2])) G - y 7 ^) ■ n 3/2 ). 

Proof. Let 

S = S W = (1 2 »1 1) 

6 times 

with a+6 = n, where a and 6 will be chosen later on. We will refer to the first a elements, which 
have values of ^, as the small elements and to the last b elements, all of which are of value 1, as 
the large elements. The probability that a particular element of the large ones is greater than 
all small elements in s is at least — -r^. Thus, we expect to see b ■ such elements. In order 
to get our lower bound, we want the median of s to be among the large elements. For that 

1— — A 2j 2a 

purpose, we need b ■ — > ^, which is equivalent to b > = 2n-2a = ^W- Thus, we need 

_____ n 

b > n ■ \/ d/2. (Note that, since b < n, this requirement makes our construction impossible for 
d > 2.) 

We obtain the following: With constant probability, at least n/2 of the large elements are 
greater than all small elements of s. In this case, every left-to-right maximum of the small 
elements has to be compared to at least n/2 elements. The lower bound for the number of 
left-to-right maxima under uniform noise yields 

E (c-scan d (i . . . 4)) = E (oscan,£ (£,..., f)) G ft(^a 2 /dn), 

which in turn gives us 

E(c-find d ( S , [n/2])) eflf^^O . 

The only restriction on a comes from b > n ■ d/2, which allows us only to choose a < 
n • (l — -y/ci/2). This, however, yields the result. □ 



3.2 Upper Bound for d > 2 

In this section, we prove that the expected number of comparisons that Hoare's find needs in 
order to find the median is linear for any d > 2, with the constant factor depending on d. 

First, we prove a crucial fact about the value of the median: Intuitively, the median should 
be around d/2 if all elements of s are 0, and it should be around 1 + d/2 if all elements of s 
are 1. For arbitrary input sequences s, it should be between these two extremes. We make 
this more precise: Independent of the input sequence, the median will be neither much smaller 
than d/2 nor much greater than 1 + d/2 with high probability. This lemma will also be needed 
in Section [373l where we prove an upper bound for the case d = 2. 



Lemma 3.3. Let s G [0, l] n , and let d > 0. Let £ = cy^logn/n. Let m be the median of s. 
Then 

2c 2 log n N 



P m £ 



d d ^ 

2"^' 1 + 2 +e 



< 4 • exp 



d 2 
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Proof. Let b = | — £■ We restrict ourselves to prove P(m < b) < 2 • exp(— 2c J° g " ) ■ The other 
bound follows by symmetry. Fix any i. The probability Sj < b is max{0, ^r 1 } < 3. If m < 6, 
then at least n/2 elements must be smaller than b. The expected number of elements is 2j. 
Thus, we can apply Chernoff 's bound [13, Corollary 4.6] and obtain 



< 6) < P(at least n/2 elements are smaller than b) < 2exp 



3d 



o i 4 £ 2 "A o / 4c 2 logn\ 2c 2 log n 

2 exp — — — = 2 exp — — — < 2 exp 



m 



3bd J > { 3bd J^—r\ <p 

□ 

The idea to prove the upper bound for d > 2 is as follows: Since d > 2 and according to 
Lemma 13.31 above, it is likely that any element can assume a value greater or smaller than the 
median. Thus, after we have seen a few number of pivots (for which we "pay" with O(-^n) 
comparisons), all elements that are not already cut off are within some small interval around 
the median. These elements are uniformly distributed. Thus, the linear average-case bound 
applies. 

Lemma 3.4. Let d > 2 be bounded away from 2. Then 

d 

n 



max E(c-findd(s, \n/2])) £ O . 
se[o,i] n \d — 2 



Proof. We can assume that d £ o(^/n/ logn): For larger values of d, we already have a linear 
bound by Theorem 12.11 Let £ = 2d-\J\ogn/n. By Lemma 13.31 the median of s falls into the 
interval [s — £, 1 + | + £] with a probability of at least 1 — 2ra~ 8 / 3 . If the median does not 
fall into this interval, we bound the number of comparisons by the worst-case bound of 0(n 2 ), 
which contributes only o(l) to the expected value. 

The key observation to get the linear bound is the following: Every element of s can assume 
any value in the interval [l,d]. Thus, with a probability of at least — -4 1 , it assumes a 
value smaller than the median but larger than 1 (called a low cutter). Analogously, with a 
probability of at least it assumes a value greater than the median but smaller than d 

(called a high cutter). 

Now assume that we have already seen a low cutter a and a high cutter b. Then any 
element that remains to be considered is uniformly distributed in the interval [a, h\. Thus, the 
average-case bound applies, and we expect to need only 0(n) additional comparisons. 

Until we have seen both a low and a high cutter, we bound the number of comparisons by the 
trivial upper bound of n per iteration. Let C£ be the position of the first low cutter and let Ch be 
the position of the first high cutter. Then, in this way, we get a bound of max(Q, c/J • n + 0(n). 
The values of eg and Ch remain to be bounded. 

The probability that an element is either a low or a high cutter is at least 2 • - i — • 
Thus, the expected number of elements until we have seen at least one cutter is at most 
d _2£-2 • Analogously, given that we have seen one cutter, the position of the second cutter is 
an expected number of at most d Jf^_ 2 positions to the right. Thus, the expected number of 
elements until we have both a low and a high cutter is at most 



K(max (c „ C ,))<^|^ = 0(-^) 
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where the equality holds since d € o(y / n/logn). 



□ 



3.3 Upper Bound for d = 2 

In this section, we prove that the expected number of comparisons for finding the median in 
case of d = 2 is 0(nlogn), which matches the lower bound of the next section. Before we dive 
into the actual proof, we will rule out two bad cases by showing that each one of these occurs 
only with a probability of at most 0(n~ 3 / 2 ). If one of the bad events happens, then we bound 
the number of comparisons by the worst-case bound of 0(n 2 ). This contributes only 0{n l l 2 ) 
to the expected value, which is negligible. 

First, with a probability of at most 0(n~ 2 ), there is an interval of length ^ that contains 
more than 4 log n elements of the perturbed sequence. Second, with a probability of at most 
0(n -3 / 2 ), the median is larger than 2, provided that there are more than Yl\Jn logn elements 
of the original (unperturbed) sequence s that are smaller than 1/2. 

Lemma 3.5. Let s G [0, l] n . Then 

F(3a € [0,3- such that |{s; G [a,a + £]}| > logn) < 6n"§. 

Proof. Consider an arbitrary interval I = [a, a + ^-]. Then the probability that an element Si 
falls in I is at most The expected number of elements in / is therefore at most ^ ■ n = \. 
Let X denote the number of elements in /. Chernoff 's bound [13, Corollary 4.6] yields 

F(X > 7? logn) < exp(-fi(logn)) < n~ 3 . 

If there exists an interval of length 1/n that contains more than logn elements, then there 
must exist an interval [^, ^1] of length ^ that contains more than \ logn elements. There 
are 6n intervals of the latter kind. Thus, a union bound yields that the probability that there 
exists an interval of size ^ that contains more than logn elements is bounded from above by 
6n • n~ 3 G 0(n~ 2 ). □ 

Lemma 3.6. Let d = 2. Assume that the unperturbed sequence s contains at least 12y/n logn 
elements that are smaller than 1/2. Then the probability that the median of the perturbed 
sequence is greater than 2 is at most 0(n~ 3 / 2 ). 

Proof. Let £ = 12^/ n log n. Since the median is a monotone function of the elements of the 
sequence, we can assume without loss of generality that s contains only exactly £ elements that 
are smaller than 1/2. Let X denote the number of elements in the perturbed sequence s that 
are larger than 2. Then 

|n < E(X) <\{n-l) + \t=\n- \t, 

where the first inequality holds since at least | > n— £ elements are greater than 1/2. Chernoff 's 
bound [13, Corollary 4.6] yields 

P(median is larger than 2) = F(X > n/2) 

= H*> (l + ^)("/2-€/ '4)) 

< F{X > (1 + ^)E(X)) 

<2exp(-M^)<2exp(-IWf^) 

<2exp(-^p) € 0(n- 3 / 2 ). 

□ 
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We are now ready to prove the upper bound on the number of comparisons for d = 2. 
Lemma 3.7. We have 

max Efc-fnTxWs, rn/2l)) GO(relogn). 

se[o,i] n 

Proof. By Lemmas 13.31 I3.5( and 13.61 the following cases only eventuate with a probability of 

0(n- 3 / 2 ): 

• The median of s does not belong to the interval [l — £, 2 + £] for £ = 4y^j|^. 

• There is an interval of length - that contains more than log n elements. 

• Given that there are more than 4^/ n log n elements smaller than ^ in the original se- 
quence, the median is nevertheless larger than 2. 

If any of these events happens nevertheless, we bound the number of comparisons by the trivial 
bound of 0(n 2 ). This contributes only 0(n 2 ^ 3 ) to the expected value, which is negligible. In 
the following, we assume that no bad event happens. 

Let m denote the median. We distinguish between large elements, which are larger than 
m, and small elements, which are smaller than m. To gain a better intuition, we review the 
random process that generates s as follows. As before, we first generate s and then process it 
from left to right. In particular, this fixes the median m and it also fixes which elements are 
small and elements are large. During this first process, we assume that no bad event happens. 
Now, in the second step, we redraw certain elements without changing the overall probability 
distribution: When a large pivot element Sj is encountered, we not only delete all elements 
larger than Sj, but we also redraw every large element Sj < ~s~i uniformly at random from the 
interval [m, mm{sj, s,- + 2}]. Similarly, when a small pivot element Sj is encountered, we not 
only delete all elements smaller than Sj, but also redraw every small element Sj < Sj uniformly 
at random from the interval [sj,min{m, Sj + 2}]. This does not change the distribution of s. 

We now argue that the number of pivot elements is in O(logn). Since every pivot element is 
compared to at most n other elements, this yields the desired bound of O(nlogn) comparisons. 

Note that a small element becomes a pivot element if and only if it is a left-to-right maximum 
among the sequence of small elements. Similarly, a large element is a pivot element if and only 
if it is a left-to-right minimum among the sequence of large elements. We determine the number 
of left-to-right minima and maxima separately. By symmetry, we can assume m > 1.5. We 
first deal with the number of pivot elements among the large elements. If at some point all 
large elements lie in an interval of length ^, then we know that there are at most O(logn) large 
elements remaining. In total these elements can only contribute 0(n log n) comparisons. We 
show that we only need a logarithmic number of iterations to ensure that all remaining large 
elements lie in such a small interval. So in total only a logarithmic number of large elements 
become a pivot element. 

Lemma 3.8. After 12 log n iterations, all remaining large elements lie in an interval of length 
— with probability at least 1 — n~ 8 / 3 . 

Proof. Let s| denote the i-th large pivot element. Let [m, c] denote the interval for which sf is 
eligible. (A random number is eligible for an interval if it can take any value in this interval.) 
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By construction, sf is drawn uniformly at random from this interval. So with a probability of 
|, it lies in the first half of its interval, i.e., P(s^ G [m,c/2]) = ^. 

After processing at most 12 log n large pivot elements, we will have encountered at least 
21ogn pivot elements that lie in the first half of their eligible interval with sufficiently high 
probability. In particular, let X denote the number of pivot elements among the first 12 log n 
large elements that lie in the first half of their interval. Then, by Chernoff 's bound, 

P(X < 2 log re) < n~ 8/3 . 

Each of these at least 2 log n large pivot elements halves the interval for which all the remaining 
large elements are eligible. Thus, the interval containing all large elements has length at most 

4eo(i). □ 

Hence, the case when the remaining interval of the large elements is larger than - only 
contributes o(l) comparisons to the expected number of comparisons. 

It remains to bound the number of small pivot elements. For that purpose, we distinguish 
between the case when m < 2 and m > 2. If m < 2, then by the same line of reasoning as in 
the proof of Lemma 13.81 we need at most O (log re) small pivot elements until we have a pivot 
element larger than 2 — ^. There are only O(logn) elements in the interval [2 — ^,2], which 
contributes again O(logn) pivot elements. 

The case m > 2 remains to be considered. All the remaining small elements lie in [2, m] C 
[2,2 + y/log n/nj . The reason why we cannot apply the same argument for the remaining 
interval is that there might be small elements that are not eligible for the whole interval and so 
we cannot ensure that in each iteration the interval almost halves. However, intuitively, most 
small elements should indeed be eligible for the whole interval. In fact only elements Sj with 
Si < £ < \ could possibly fail to be eligible for the whole interval. Since we have ruled out that 
there are more than 4y/n log n elements smaller than 5 in the original sequence, it follows that 
there are, in expectation, only 0{\Jn logn- ydogn/n) = O(logn) elements that are not eligible 
for the whole remaining interval [2, m]. Thus, they contribute only O(nlogn) comparisons. 
All the other small elements are eligible for the whole interval [2, m], so, by the same line of 
reasoning as in Lemma 13.81 we conclude that after encountering O(logn) such pivot elements, 
the remaining interval is of size 1/n. By assumption, such an interval only contains O(logn) 
elements, which completes the proof. □ 

3.4 Lower Bound for d = 2 

In this section, we show that the upper bound for d = 2 is actually tight. The main idea behind 
the following result is as follows: First, to get the lower bound, we have to make sure that the 
median is close to 1 or close to 2. Otherwise, if the median is bounded away from 1 and 2, then 
a reasoning along the lines of Lemma 13.41 would yield a linear upper bound. We choose the 
sequence such that the median is roughly 2. To do this, most elements are set to 1. Only the 
first few elements (few here means n 1 ' 4 ) are set to 0. They yield O(logn) left-to-right maxima, 
and all these become pivot elements. Each of these pivot elements contributes a linear number 
of comparisons. 

Lemma 3.9. There exists a sequence s of length n with 

c-find2(s, \n/2\) £ Vt[n ■ logn). 
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Proof. Consider the sequence 

s = (0,0,..., 0; 1,1,...,! )• 

n 1 / 4 times n — n 1 / 4 times 

The probability that the first n 1 / 4 elements of s are at most 2 — n -1 / 4 is 

-I/A" 174 / 1 \« 1/4 1 

2n-V4j -2' 

The probability that one particular element of the last n— ra 1//4 elements is greater than 2— n^ 1 / 4 
is 1+n 2 1 4 . Thus, for sufficiently large n, we expect to see 



1 + n" 1 / 4 , 1/4N n + n 3 / 4 - n 1 / 4 - 1 



n 



■ (n - n l/A ) = > - 

2 v ; 2 ~ 2 

such elements. Hence, with constant probability, at least n/2 of the last n — n 1 / 4 elements of s 
are greater than all of the first n 1 / 4 elements of s. Both observations together imply that the 
following two properties hold with constant probability: 

1. The median of s is among the last n — n 1//4 elements. 

2. All left-to-right maxima of the first n 1 / 4 elements of s have to be compared to all elements 
greater than 2 — n -1 / 4 , and there are at least n/2 such elements. 

The number of left-to-right maxima of the first n 1 / 4 elements of s is expected to be H n i/4 £ 
0(logn), which proves the lemma. □ 



4 Scan Maxima with Median-of-three Rule 

The results in this section serve as a basis for the analysis of both quicksort and Hoare's 
find with the median-of-three rule. In order to analyze the number of scan maxima with the 
median-of-three rule, we analyze this number with the maximum and minimum of two rules. 
The following lemma justifies this approach. 

Lemma 4.1. For every sequence s, we have 

max2-scan(s) < m3-scan(s) < min2-scan(s). 

Proof. Let us focus on the first inequality. The proof of the second then follows immediately 
along the same lines. 

Let m = (mi, m2,...) be the pivot elements according to the median-of-three rule, i.e., 
mi = median ( s i, s\ n /2] > s n), ^2 is the median of the first, middle, and last element of the 
sequence containing all elements greater than m±, and so on. Likewise, let m! = {w! x ,w! 2 , ■ ■ •) 
be the pivot elements according the maximum-of-two rule. 

Now our aim is to prove that > nii for all i. Since we take left-to-right maxima until all 
elements are removed, in particular the maximum of s must be an element in both sequences 
m and m'. Thus, m is at least as long as m! , which proves the lemma. 
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Figure 1: Tj consists of the 2\/nd positions following position i and preceding the i-th last 
position, which is n — i + 1. We estimate the probability that ((3421) none of the 
elements drawn with horizontal lines gets a huge noise added to it and ([3]) at least one 
of the elements drawn in Crosshatch gets a huge noise and becomes a scan maximum. 



The proof of m! i > rrii is by induction on i. The case i = 1 follows from max(si,s n ) > 
median(si,S|- n/2 ],Sn)- 

Now assume that s' and s" be the sequences of elements that are greater than TOj_i and 
m' i _ 1 , respectively. Let t and £' be their lengths. By the induction hypothesis, ra^x < m' i _ l . 
Thus, s" is a subsequence of s' . The only elements that s' contains that are not part of s" are 
the elements of value at most m' i _ 1 . 

We have = max(r{ , r^,), and m,j = median(>i, Trmi , Ti) < max(Ti, ti). Now either t\ = t[ 
or T\ < m' i _ 1 < t[. The same holds for and r^,, which proves the lemma. □ 

The reason for considering max2-scan and min2-scan is that it is hard to keep track where the 
middle element with median-of-three rule lies: Depending on which element actually becomes 
the pivot and which elements are greater than the pivot, the new middle position can be on 
the far left or on the far right of the previous middle. 

Let us first prove a lower bound for the number of scan maxima. 

Lemma 4.2. There exists a sequence s such that for all d > 1/n, we have 

E(max2-scan(j(s)) £ 17 ( + logn^ . 

Proof. For simplicity, we assume that n is even. Let s = §, . . . , w ^~ 1 , i, i, n ~ X , ■ ■ ■ , 

* It lb Th £l £l Th Th * 

Let 

Ti = {i + 1, i + 2, . . . , i + 2Vnd} U{rt — i,n — i — 1, ... ,n — i — 2\fnd + 1} 

be the set of the 2\fnd indices following i plus the indices preceding n — i. Note that 

sr t for i < n/2 — 2^/nd contains the corresponding values of the first and second half of s. 

Let us estimate the probability that at least one element of Tj becomes a left-to-right max- 
imum. If this probability is constant, then we immediately obtain a lower bound of Q(y/n/d) 
by linearity of expectation. (It then still remains to prove the r2(logn) lower bound.) 

Assume that there exist indices j < f such that s« < min(sj,Sj/) for all i < j and i > f . 
Then at least one of them becomes a left-to-right maximum. 

Fix any i < % — 2\fnd. Figure [T] shows Tj and illustrates the event whose probability we 
want to estimate now. Remember that Vi denotes the additive noise at position i. Assume the 
following holds: 

1. V l+1 ,...,V i+V ^<d - yjl. 
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2. V n -i, Vn-i-l, ■ ■ ■ , v n -i-yJT!A+\ — ^ ~ \J n' 

3. There exist j,j' G T{ such that Vj, vy > d — 

Choose j to be minimal and f to be maximal. Then j > i + \/nd and / < n — i — \fnd. If 
the three properties above are fulfilled, then, by the choice of j and j', Sj > Si for all i < j 
and i > j': For j G Tj, this follows from the minimality of j, the maximality of j'. For i ^ T, 
i < ra/2, we have s i = i+i/ i <i+d = i±^l + _ . fl < g . by the fact that Vj > d - x [^. 

Furthermore, j or j' is a left-to-right maximum: Suppose not, then there must exist an i < j 
or an i > j' that becomes a pivot which causes positions j and/or j' to vanish. This, however, 
contradicts the property as shown above. Thus, if the three properties are fulfilled, we have a 
left-to-right maximum in Tj. 

Let us estimate the probability that this happens. We have 




if \/nd > 2. The latter is fulfilled if d > 4/n. If d = c/n is smaller, we easily get a lower 
bound of Q(n) by restricting the adversary to the interval [0, c/4]: We can apply the bound 
for d = 4/n by scaling. 



By symmetry, also 




Furthermore, 

P [3j £ {i + Vnd + 1, . . . , i + 2\fnd} : v j > d 

\ n 




and the same lower bound holds for the probability that there exists a j' £ Tj as described 
above. Overall, the probability that j and j' exist is constant, which proves the lower bound 
of ^(v^/d). 

To finish the proof, let us prove that, on average, we expect to see fi(logn) scan maxima. 
To do this, let us consider the sequence s = (0, 0, . . . , 0). We obtain s by adding noise from 
[0, d]. The ordering of the elements in s is now a uniformly distributed random permutation. 
We take a different view on the maximum-of-two pivot rule: We take si, get a half point for 
it and eliminate all elements smaller than s\. If s n has also been eliminated, then we have 
completed this iteration. Otherwise, we take s n , get another half point and again eliminate all 
smaller elements. 

The number of scan maxima of s is at least the number of points we get. Since the elements of 
s appear in random order, the expected number of points is \ -H n , where H n is the average-case 
number of left-to-right maxima. □ 

Now we turn to the upper bound for scan maxima. 
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Lemma 4.3. For all sequences s and d > ^, we have 

E(min2-scand(s)) G O (^\J^ + ^ogn \ . 

Proof. First, we observe that a necessary condition for an element ~s~i to become a pivot element 
is that it is either a left-to-right maximum (according to the usual rule), i.e., no element Sj for 
j < i is greater than Sj, or that it is a right-to- left maximum, i.e., no element Sj for j > i is 
greater than Sj. 

Hence, an upper bound for min2-scan(s) is c-scan(s) plus the number of right-to-left maxima. 
The former is at most Oi-^/njd + logn), the latter can be analyzed in exactly the same way. 
Thus, the lemma follows. □ 

From Lemmas I4.1( 14.2} and 14.31 we immediately get tight bounds for the number of scan 
maxima with median-of-three rule. 

Theorem 4.4. For every d>l/n, we have 

max E(m3-scanrf(s)) G ( \ — + log 
se[o,i] n v ' VV « 



n 



5 Quicksort and Hoare's Find with Median-of-three Rule 

Now we use our results about scan maxima from the previous section to prove lower bounds 
for the number of comparisons that quicksort and Hoare's find need using the median-of-three 
pivot rule. We only prove lower bounds here since they match already the upper bounds 
for the classic pivot rule. We strongly believe that the median-of-three rule does not yield 
worse bounds than the classic rule and, hence, that our bounds are tight. Our main goal of 
this section is to prove the following result for Hoare's find. This bound carries then over to 
quicksort. 

Theorem 5.1. For d> 1/n, we have 

max E(m3-findd(a)) G O ( -rhr \fnjd + n) . 
se[o,i] n ' v + ' 

Proof. We use the maximum- of-two rule to prove this lower bound. To this end, consider the 
following sequence: Let A = {1, . . . , |} U + 1, . . . , n} and let s be defined by 

( mm (i .S=±=i) ifiGAand 
I 1 otherwise. 

Figure [2] gives an intuition how s looks like. We observe that sa is, up to scaling, identical to 
the sequence used in Lemma 14.21 (up to scaling). To analyze the number of comparisons, we 
distinguish between small and large values of d. 

First, assume that d < |. Then all elements of sr n i_A are greater than all elements of sa, 
including the scan maxima of sa- From Lemma [4. II and the proof of Lemma [4.21 we know that 
sa contains f2(yn/d + logn) scan maxima. Each of these maxima has to be compared to all 
of the re/3 elements of S[ n ]_A> resulting in Q(n • (yre/5 + log n)) comparisons. 
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n/3 elements n/3 elements n/3 elements 



Figure 2: How the sequence of Lemma 15.11 looks like. The black elements contribute scan 
maxima, the white elements are large elements. All black scan maxima have to be 
compared to all or at least Q(n/d) white elements. 



The second case is d > |. Again, there are Q^n/d + logn) scan maxima under the 
maximum-of-two rule in sa, which carry over to s. According to Lemma [4. H there are at least 
that many median-of-three scan maxima (m3 maxima) in s, but since d may be greater than 
|, some of the m3 maxima may be from sr n i\A. This poses no harm because the position of 
the pivots is of no relevance to the sorting process, but only their magnitude. In turn, the 
magnitude of an m3 maximum is at most the magnitude of the corresponding maximum-of-two 
scan maximum (max2 maximum). 

We can now bound the number of comparisons appropriately. The probability that an 



element s, (i € [n] \ A) is greater than the first 0, ly/n/d + logn J m3 maxima is at least the 
probability that it is greater than all elements of sa maxima, which are located in sa, i.e. 



Ti > first n {\fnJd-\- logn) m3-LTRMs ) > P (l + v t > ^ + d J 



3d' 



Thus, by linearity of expectation, an expected number of ft(n/d) elements of sr n i\ a are greater 
than the first Cl(y/n/d + logn) m3 maxima and have to be compared to all of them. This 
requires f2(^ • y^j) comparisons. Since we always need at least Q(n) comparisons, the theorem 
follows. □ 

Since the number of comparisons that Hoare's find needs is a lower bound for the number 
of quicksort comparisons, we immediately get the following result for quicksort. 

Corollary 5.2. For d > 1/n, we have 



max E(m3-sort^(s)) G ®(-jj-r^nl d + nlogn). 

se[o,i] n + 

Proof. The result follows from Theorem 1 5 . 1 1 and the observation that quicksort always requires 
at least Q(nlogn) comparisons. □ 



6 Hoare's Find Under Partial Permutations 

To complement our findings about Hoare's find, we analyze the number of comparisons subject 
to partial permutations. For this model, we already have an upper bound of logn), since 
that bound has been proved for quicksort by Banderier et al. [2]. 

We show that this is asymptotically tight (up to factors depending only on p) by proving 
that Hoare's find needs a smoothed number of fi((l — p)- ■ logn) comparisons. 
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The main idea behind the proof of the following theorem is as follows: We aim at finding the 
median. The first few elements are close to and smaller than the median (few means roughly 
6((m/j>) 1/4 )). Thus, it is unlikely that one of them is permuted further to the left. This 
implies that all unmarked of the first few elements become pivot elements. Then we observe 
that they have to be compared to many of the Q(n) elements larger than the median, which 
yields our lower bound. 

Theorem 6.1. Let p G (0, 1) be a constant. There exist sequences s of length n such that 
under partial permutations we have 



E(c-find p (s)) G tt ^(1 -p) ■ ^ ■ log raj . 



Proof. For simplicity, we restrict ourselves to odd n and permutations of — m, —m + 1, . . . , m 
for 2m + 1 = n. This means that is the median of the sequence. Let Q = (m/p) 1 ^. We 
consider the sequence 

s = (-Q, -Q + 1, . . . , -1, -m, . . . , -Q - 1, 1, ... , rai, 0). 

The important part of s are the first Q elements. All other elements can as well be in any 
other order. 

Assume that the unperturbed element Sj = — Q + i — 1 (i < Q) becomes a pivot and is 
unmarked. The latter happens with a probability of 1 — p. The former means that all marked 
elements among — Q + i, . . . , — 1 are permuted further to the right (more precisely: not to the 
left of position i). Let 

Mi = min({s j | Sj > 0, j < i} U {m + 1}) . 

Then Sj contributes Mi comparisons. (Actually, at least Mi + Q — i comparisons, but we ignore 
the Q — i since it does not contribute to the asymptotics.) Let E\ be the event that the i-th 
position is unmarked, Sj = Sj becomes a pivot, and Mj > k. Using lower bounds for P(£f), 
we get a lower bound for the expected number of comparisons. 

Let A be the number of marked positions prior to i, let B be the number of marked elements 
among — Q + i, . . . , — 1 and among 0, . . . , k, and let N be the total number of marked elements. 

Given this and A < B, the probability of E\ is 



W k = (l-p) 



N-A N-A-l N-A-B+l 
N - 1 ' ' N-B-l 
N — A — B\ A . . / , / A + B 



>(l-p)-( ) =(l-p)-exp(A-ln^l- , 

2A(A + B)\ . , / 4AB 
> (1 — p) ■ exp — > (1 — p) ■ exp 



N ~ K ' \ N 



The first inequality holds since A < B and therefore most factors cancel each other out. The 
second inequality holds since ln(l — x) > —2x for x G [0, |]. The third inequality holds again 
since A < B. 

This bound is monotonically decreasing in A and B, and monotonically increasing in N. 
Thus, we need upper bounds for A and B and a lower bound for N. Now let 1/p < i < Q — l/p, 
and let k > yrn/p. At most 2pi positions prior to i, at most 2p(Q — i) positions after i and 
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before Q are marked with a probability of f2(l). Furthermore, at least %n positions overall 
are marked, and at most 2pk elements among 0, . . . , k are marked. The last two requirements 
happen with a probability close to 1. This yields A < 2pi, B < 2pk + 2p(Q — i) < 3pk as well 
as N > |n. Since i > 1/p and Q — i > 1/p, the probability that all these bounds are satisfied 
is at least a constant c > 0. This allows us to bound W as follows: 



Wfc > c • (1 — p) ■ exp 



48pki 



n 



Let Ki = exp( — -^). We observe that > c' € 0,(1). Using this to bound the expected 

number of comparisons, we get that the expected number of comparisons with the unmarked 
Si as the pivot element is at least 

-+ £ W k >cc>.(l-p).J2K? = cc>.(l-p).K i l_A: 



1 /, — -™ r j / j % V *V * i r - 




cc' . . 1 cc' , n 
>TT-(1-P)-1 ET>-S-'(1-P)- 



2 v 1 - ^ ~ 2 v ^ 96pi 

We use the linearity of expectation, sum over all i G {1, . . . , (m/p) 1 / 4 }, and get the desired 
bound. □ 

For completeness, to conclude this section, and as a contrast to Sections [2] and let us 
remark that for partial permutations, finding the maximum using Hoare's find seems actually 
to be easier than finding the median: The lower bound constructed above for finding the 
median needed that there are elements on either side of the element we aim for. If we aim 
at finding the maximum, all elements are on the same side of the target element. In fact, we 
believe that for finding the maximum, an expected number of 0(f(p) ■ n) for some function / 
depending on p suffices. 



7 Concluding Remarks 

We have shown tight bounds for the smoothed number of comparisons for Hoare's find under 
additive noise and under partial permutations. Somewhat surprisingly, it turned out that, 
under additive noise, Hoare's find needs (asymptotically) more comparisons for finding the 
maximum than for finding the median. Furthermore, we analyzed quicksort and Hoare's find 
with the median-of-three pivot rule, and we proved that median-of-three does not yield an 
asymptotically better bound. Let us remark that also the lower bounds for left-to-right maxima 
as well as for the height of binary search trees [11] can be transferred to median-of-three. The 
bounds remain equal in terms of the number n of elements. 

A natural question regarding additive noise is what happens when the noise is drawn ac- 
cording to an arbitrary distribution rather than the uniform distribution. Some first results on 
this for left-to-right maxima were obtained by Damerow et al. [4]. We conjecture the following: 
If the adversary is allowed to specify a density function bounded by (j), then all upper bounds 
still hold with d = l/<j> (the maximum density of the uniform distribution on [0, d] is 1/d). 
However, as Manthey and Tantau point out [12], a direct transfer of the results for uniform 
noise to arbitrary noise might be difficult. 
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